canon: add telemetry-validation-gate constraint by klappy · Pull Request #210 · klappy/klappy.dev

klappy · 2026-05-15T23:33:37Z

Adds tier-1 canon defining the single gate for verifying the telemetry Emission Contract.

Why this exists

The handoff at klappy://odd/handoffs/2026-05-14-telemetry-coverage-completeness ran a "24-hour soak validator" framing on the post-PR-#157 cutover. In session it became clear that framing is incoherent for oddkit specifically: there is no organic load to soak against, and "wait for organic ≥95% coverage on every tool" is unmeetable against manufactured smoke traffic.

The actual question — does the wrapper emit the numbers we expect for the payloads we send? — is deterministic and answerable in a single smoke pass per surface.

What the gate is

Enumerate every server.tool() registration.
Drive one synthetic call per tool through each active surface (main preview, prod).
Locally compute expected bytes_in, bytes_out, tokens_in, tokens_out (cl100k_base).
Query telemetry; match emitted to expected; pass if within tokenizer noise (~5%).

No time bound. No sample threshold beyond 1/tool/surface. No statistical ceremony.

Relationship to release-validation-gate

Rule 2 there triggers on response-envelope changes, tool add/remove, governance-read changes, and orchestrate.ts edits. Wrapper-only changes touch none of these — callers see identical responses. This PR notes the orchestrator may smoke-verify directly per this new gate when Rule 2 is not triggered. If a future wrapper change does touch load-bearing surface in the Rule 2 sense, both gates apply.

Gauntlet (Writing Canon checklist)

Title test: pass
Blockquote test: pass — full compressed argument
Metadata test: pass — full file paths in derives_from
Summary test: pass — self-contained
Header scan test: pass — sequence tells the story
No buried claims: pass
Axiom space test: pass — ~2K words, similar order to release-validation-gate
Ghost writer test: pass — caught one negation-parallelism instance pre-commit and rewrote
Em-dash density: non-clustering (counts checked per section)

Receipts

klappy://canon/observations/2026-05-14-telemetry-coverage-gap-quantified — diagnostic
klappy://canon/decisions/DR-20260514-0001-telemetry-wrapper-pattern — decision record
klappy://canon/observations/performed-prudence-anti-pattern — the failure mode this gate avoids
klappy://odd/handoffs/2026-05-14-telemetry-coverage-completeness — superseded soak framing

Note

Low Risk
Low risk: adds a new tier-1 canon constraint document only, with no code or runtime behavior changes.

Overview
Adds a new tier-1 canon constraint, telemetry-validation-gate, defining the required release gate for validating the telemetry Emission Contract.

The gate replaces time-bound/organic-traffic “soak” expectations with a single synthetic smoke call per registered tool per deployment surface, and requires comparing emitted bytes_in/bytes_out/tokens_in/tokens_out against locally computed ground truth (with an explicit noise tolerance and SSE streaming exception). It also clarifies how this gate interacts with release-validation-gate Rule 2 (wrapper-only changes can be verified via smoke without triggering fresh-context validation unless load-bearing surface changes).

^{Reviewed by Cursor Bugbot for commit e46ca20. Bugbot is set up for automated code reviews on this repo. Configure here.}

Adds tier-1 canon defining the single smoke-and-verify gate for the telemetry Emission Contract: enumerate registered tools, drive one synthetic call per tool per surface, compare emitted bytes/tokens against locally-computed expectations. No time bound. No statistical threshold. Sample size of 1 per tool per surface is sufficient because the wrapper is deterministic. Supersedes the implicit '24-hour soak' framing in odd/handoffs/2026-05-14-telemetry-coverage-completeness, which assumed organic load oddkit does not actually receive. Notes that release-validation-gate Rule 2 is arguably not triggered by wrapper-only changes (no response-envelope change, no tool add/remove). If a future wrapper change touches load-bearing surface in the Rule 2 sense, both gates apply. derives_from telemetry-governance, release-validation-gate, performed-prudence-anti-pattern.

github-actions · 2026-05-15T23:33:48Z

Canon Quality — Frontmatter Schema ✅

All 41 file(s) in writings/ conform to klappy://canon/meta/frontmatter-schema.

_{Validator: scripts/validate-frontmatter.py · Canon: klappy://canon/constraints/frontmatter-validation-before-merge · Run: #153}

github-actions · 2026-05-15T23:33:49Z

Canon Quality — `oddkit_audit` ✅

No dead klappy:// references or legacy link patterns found in writings/. 42 files scanned.

_{Spec: klappy://docs/oddkit/specs/oddkit-audit · Workflow: .github/workflows/canon-quality.yml · Run: #153}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Gate procedure computes expected values from wrong inputs
- Updated steps 2 and 3 of the gate procedure to record and measure the in-memory args object and { content: [...] } envelope (matching the wrapper's actual emission inputs per telemetry-governance Rule 2) instead of the full HTTP request/response bodies.

Preview (e46ca204fd)

diff --git a/canon/constraints/telemetry-validation-gate.md b/canon/constraints/telemetry-validation-gate.md
new file mode 100644
--- /dev/null
+++ b/canon/constraints/telemetry-validation-gate.md
@@ -1,0 +1,114 @@
+---
+uri: klappy://canon/constraints/telemetry-validation-gate
+title: "Telemetry Validation Gate — Smoke Every Tool, Verify Every Number"
+audience: canon
+exposure: nav
+tier: 1
+voice: neutral
+stability: evolving
+tags: ["canon", "constraint", "telemetry", "validation", "smoke-test", "wrapper-correctness", "release-pipeline", "analytics-engine"]
+epoch: E0008
+date: 2026-05-15
+derives_from: "canon/constraints/telemetry-governance.md, canon/constraints/release-validation-gate.md, canon/observations/performed-prudence-anti-pattern.md"
+complements: "canon/decisions/DR-20260514-0001-telemetry-wrapper-pattern.md, canon/observations/2026-05-14-telemetry-coverage-gap-quantified.md"
+governs: "Every release that touches the telemetry Emission Contract surface in oddkit and TruthKit"
+status: active
+---
+
+# Telemetry Validation Gate — Smoke Every Tool, Verify Every Number
+
+> The Emission Contract requires every registered tool to emit accurate metered usage on every call. Verifying it is one smoke pass per surface: hit every tool, compare the emitted `bytes_in`, `bytes_out`, `tokens_in`, `tokens_out` against the request and response that were actually sent. If the numbers match expectations within tokenizer noise (3–4% for `cl100k_base`), the wrapper is working. There is no soak period, no organic-load threshold, no statistical sample bar. Synthetic traffic is the only traffic; the wrapper is deterministic; one call per tool is sufficient.
+
+---
+
+## Summary — Stop Pretending Sample Size Buys Confidence
+
+oddkit's hosted service does not see enterprise-scale organic traffic. Real consumers number in the low single digits at any given moment, and most of those are the maintainer themselves. A validation model built around "wait for 24 hours of organic load and check per-tool coverage at 95%" is performed prudence — it inflates statistical ceremony around a question that does not need statistics to answer.
+
+The actual question is: does the per-tool wrapper emit the correct metered values when a known payload passes through it? That question is deterministic. The wrapper is code. Either it reads the JSON-stringified args and envelope, runs `cl100k_base` over them, and writes the result to Analytics Engine — or it doesn't. One call with a known input and known output answers the question completely.
+
+The gate is therefore: drive a synthetic smoke pass across every registered tool on every active deployment surface (main preview and prod after promotion). For each call, compare the emitted numeric fields against what the smoke driver actually sent and received. Tokenizer noise of 3–4% for English-prose payloads is the only legitimate variance; anything else is a bug.
+
+Sample size is one per tool per surface. Increase it for operator margin if desired, but the canon bar is one. There is no time bound. There is no organic-load requirement. If the smoke pass shows accurate numbers across every tool, the wrapper is verified.
+
+---
+
+## The Gate
+
+**When:** After any PR touching `withTelemetry`, tool registration, or the emission envelope is deployed to a surface — main preview after merge to `main`, or prod after the `main → prod` promotion. Run the gate against each surface the change reaches, before declaring that surface verified.
+
+**Question it answers:** Does the wrapper emit accurate `bytes_in`, `bytes_out`, `tokens_in`, `tokens_out` for every registered tool?
+
+**Procedure:**
+
+1. Enumerate every `server.tool()` registration in `workers/src/index.ts`. This is the smoke target list.
+2. Drive one synthetic call per tool through the surface's `/mcp` endpoint. Record the exact `args` object sent (the JSON-RPC `params.arguments` payload) and the exact `{ content: [...] }` envelope returned by the handler — not the full HTTP request/response bodies, which include JSON-RPC framing the wrapper does not see.
+3. For each call, compute the expected values locally against the same in-memory values the wrapper measures per `klappy://canon/constraints/telemetry-governance` Rule 2: `bytes_in = utf8_byte_length(JSON.stringify(args))`, `bytes_out = utf8_byte_length(JSON.stringify(content_envelope))`, `tokens_in = cl100k_count(JSON.stringify(args))`, `tokens_out = cl100k_count(JSON.stringify(content_envelope))`. For SSE-streamed responses, expected `bytes_out = 0` and `tokens_out = 0` per the Emission Contract.
+4. Query `oddkit_telemetry` with `event_type = 'tool_call'`, `worker_version = <surface-version>`, and a timestamp window covering the smoke run.
+5. Match each emitted row to the corresponding smoke call (by tool name and timing). Compare emitted versus expected on all four fields.
+
+**Pass:** Every registered tool appears in the telemetry dataset, and every emitted numeric field is within tokenizer noise (±5%) of the expected value computed locally.
+
+**Fail (missing tool):** Any registered tool is absent from the dataset after smoke. The wrapper is not attached to that registration. Block downstream work on this surface; fix forward.
+
+**Fail (wrong number):** Any emitted field is off by more than the noise floor. The wrapper is attached but emission is inaccurate. Investigate; fix; re-smoke.
+
+**Sample threshold:** One call per tool per surface is sufficient. The wrapper is deterministic; a second call with the same input emits the same output. Higher sample counts are operator discretion for cutover margin, not canon requirement.
+
+---
+
+## Why No Time Bound
+
+oddkit's hosted service receives sparse, mostly maintainer-driven traffic. "Wait 24 hours and check organic coverage" is a pattern borrowed from systems where organic traffic actually fills the sample space. Here it does not. A 24-hour window after promotion produces a dataset dominated by maintainer test calls and a handful of synthetic probes — the same data the smoke pass produces immediately, just delayed.
+
+Time bounds are appropriate for systems where the question is whether the wrapper behaves correctly under unforeseen load patterns the operator cannot manufacture — a real concern for services running thousands of QPS across heterogeneous clients. oddkit answers a smaller question: do the numbers come out right for the payloads we send? That is fully answered by deliberate exercise.
+
+Removing the time bound also removes a class of failure mode: orchestrators waiting passively for a soak window to mature, mistaking elapsed time for validation work. The smoke pass is active verification with a definite endpoint.
+
+---
+
+## Why Synthetic Is Enough
+
+The Emission Contract specifies in-memory measurement after Zod validation and before MCP transport framing. The wrapper does not care whether the call originated from a manufactured smoke probe or a real consumer; it sees the same `args` object and the same `{ content: [...] }` envelope. Synthetic and organic traffic produce identical telemetry rows when the payload sizes match.
+
+Synthetic traffic has an additional advantage that organic does not: the smoke driver knows the exact request and response bytes locally. Organic traffic only produces emitted values in the dataset; the ground truth is not directly observable. Verification against organic load is necessarily a sanity check against expected ranges, not against known values. The smoke pass is the stricter test.
+
+---
+
+## Cross-Surface Coverage
+
+The wrapper deploys to whichever surface receives the code. Currently that is two surfaces:
+
+- **Main preview** at `https://main-oddkit.klappy.workers.dev/mcp` — auto-deployed by Cloudflare on every merge to `main` in `klappy/oddkit`.
+- **Production** at `https://oddkit.klappy.dev/mcp` — deployed when the `main → prod` promotion PR merges.
+
+Each surface must be smoke-verified independently. Verifying main preview does not verify prod; the surfaces run independent worker versions and could in principle diverge.
+
+When the program adds TruthKit or any other oddkit-pattern MCP server, the same gate applies to each of those surfaces.
+
+---
+
+## Relationship to release-validation-gate Rule 2
+
+`klappy://canon/constraints/release-validation-gate` Rule 2 requires fresh-context validator dispatch on promotion PRs that touch load-bearing surface. "Load-bearing surface" is defined there by response-envelope changes, new or removed tool registrations, governance file reads, matcher algorithm changes, and `workers/src/orchestrate.ts` modifications. The telemetry wrapper does not change any of these — callers observe identical responses; no tools are added or removed; no governance reads change.
+
+A wrapper change is therefore arguably outside Rule 2's trigger. The orchestrator may smoke-verify directly per this gate without dispatching a fresh-context validator, provided the smoke pass shows accurate numbers across every tool on every surface.
+
+If a future wrapper change *does* touch load-bearing surface (for example, exposing new envelope fields to callers), Rule 2 fires in addition to this gate, and both must be satisfied.
+
+---
+
+## Receipts
+
+- `klappy://canon/observations/2026-05-14-telemetry-coverage-gap-quantified` — the diagnostic that motivated the Emission Contract and exposed how prior time-bound validation hid the actual coverage problem.
+- `klappy://canon/decisions/DR-20260514-0001-telemetry-wrapper-pattern` — decision record for the wrapper architecture this gate verifies.
+- `klappy://canon/observations/performed-prudence-anti-pattern` — the failure mode this gate is structured to avoid (statistical ceremony around a deterministic question).
+- `klappy://odd/handoffs/2026-05-14-telemetry-coverage-completeness` — original handoff whose "24-hour soak" framing this canon supersedes.
+
+---
+
+## See Also
+
+- `klappy://canon/constraints/telemetry-governance` — the Emission Contract this gate verifies.
+- `klappy://canon/constraints/release-validation-gate` — separate constraint covering promotion-PR fresh-context review.
+- `klappy://canon/constraints/measure-before-you-object` — the methodology that argues against theoretical objections to empirical answers; applies here against statistical-threshold arguments to deterministic questions.

_{You can send follow-ups to the cloud agent here.}

^{Reviewed by Cursor Bugbot for commit 3f52f07. Configure here.}

…t HTTP bodies

cursor Bot reviewed May 15, 2026

View reviewed changes

Comment thread canon/constraints/telemetry-validation-gate.md Outdated

fix(canon): gate computes expected from args and content envelope, no…

e46ca20

…t HTTP bodies

klappy merged commit d89e98e into main May 16, 2026
3 checks passed

klappy deleted the feat/canon-telemetry-validation-gate branch May 16, 2026 00:22

This was referenced May 16, 2026

fix(canon): correct SSE bytes_out claim in telemetry-validation-gate #211

Merged

ledger: 2026-05-15 cutover validation session (canon + smoke + follow-up) #213

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

canon: add telemetry-validation-gate constraint#210

canon: add telemetry-validation-gate constraint#210
klappy merged 2 commits into
mainfrom
feat/canon-telemetry-validation-gate

klappy commented May 15, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented May 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 15, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

klappy commented May 15, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why this exists

What the gate is

Relationship to release-validation-gate

Gauntlet (Writing Canon checklist)

Receipts

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Canon Quality — Frontmatter Schema ✅

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Canon Quality — oddkit_audit ✅

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

klappy commented May 15, 2026 •

edited by cursor Bot

Loading

github-actions Bot commented May 15, 2026 •

edited

Loading

github-actions Bot commented May 15, 2026 •

edited

Loading

Canon Quality — `oddkit_audit` ✅

cursor Bot left a comment •

edited

Loading